Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make

inferences Inferences are steps in reasoning, moving from premises to logical consequences; etymologically, the word ''infer'' means to "carry forward". Inference is theoretically traditionally divided into deduction and induction, a distinction that in ...

about a

population Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...

from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient

statistical power In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true. It is commonly denoted by 1-\beta, and represents the chances ...

. In complicated studies there may be several different sample sizes: for example, in a

stratified Stratification may refer to: Mathematics * Stratification (mathematics), any consistent assignment of numbers to predicate symbols * Data stratification in statistics Earth sciences * Stable and unstable stratification * Stratification, or st ...

survey Survey may refer to: Statistics and human research * Statistical survey, a method for collecting quantitative information about items in a population * Survey (human research), including opinion polls Spatial measurement * Surveying, the techniq ...

there would be different sizes for each stratum. In a

census A census is the procedure of systematically acquiring, recording and calculating information about the members of a given population. This term is used mostly in connection with national population and housing censuses; other common censuses incl ...

, data is sought for an entire population, hence the intended sample size is equal to the population. In

experimental design The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associ ...

, where a study may be divided into different

treatment group In the design of experiments, hypotheses are applied to experimental units in a treatment group. In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...

s, there may be different sample sizes for each group. Sample sizes may be chosen in several ways: *using experience – small samples, though sometimes unavoidable, can result in wide

confidence interval In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as 9 ...

s and risk of errors in

statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

. *using a target variance for an estimate to be derived from the sample eventually obtained, i.e., if a high precision is required (narrow confidence interval) this translates to a low target variance of the estimator. *using a target for the power of a

statistical test A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...

to be applied once the sample is collected. *using a confidence level, i.e. the larger the required confidence level, the larger the sample size (given a constant precision requirement).

Introduction

Larger sample sizes generally lead to increased

precision Precision, precise or precisely may refer to: Science, and technology, and mathematics Mathematics and computing (general) * Accuracy and precision, measurement deviation from true value and its scatter * Significant figures, the number of digit ...

when

estimating Estimation (or estimating) is the process of finding an estimate or approximation, which is a value that is usable for some purpose even if input data may be incomplete, uncertain, or unstable. The value is nonetheless usable because it is der ...

unknown parameters. For example, if we wish to know the proportion of a certain species of fish that is infected with a pathogen, we would generally have a more precise estimate of this proportion if we sampled and examined 200 rather than 100 fish. Several fundamental facts of mathematical statistics describe this phenomenon, including the

law of large numbers In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials shou ...

and the

central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselv ...

. In some situations, the increase in precision for larger sample sizes is minimal, or even non-existent. This can result from the presence of

systematic error Observational error (or measurement error) is the difference between a measured value of a quantity and its true value.Dodge, Y. (2003) ''The Oxford Dictionary of Statistical Terms'', OUP. In statistics, an error is not necessarily a " mistak ...

s or strong dependence in the data, or if the data follows a heavy-tailed distribution. Sample sizes may be evaluated by the quality of the resulting estimates. For example, if a proportion is being estimated, one may wish to have the 95%

be less than 0.06 units wide. Alternatively, sample size may be assessed based on the

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...

of a hypothesis test. For example, if we are comparing the support for a certain political candidate among women with the support for that candidate among men, we may wish to have 80% power to detect a difference in the support levels of 0.04 units.

Estimation

Estimation of a proportion

A relatively simple situation is estimation of a

proportion Proportionality, proportion or proportional may refer to: Mathematics * Proportionality (mathematics), the property of two variables being in a multiplicative relation to a constant * Ratio, of one quantity to another, especially of a part compare ...

. For example, we may wish to estimate the proportion of residents in a community who are at least 65 years old. The

estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished. For example, the ...

of a

\hat p = X/n

, where ''X'' is the number of 'positive' e.g., the number of people out of the ''n'' sampled people who are at least 65 years old). When the observations are

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...

, this estimator has a (scaled)

binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no quest ...

(and is also the

sample Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of s ...

mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value (magnitude and sign) of a given data set. For a data set, the ''arithme ...

of data from a

Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabil ...

). The maximum

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers ...

of this distribution is 0.25, which occurs when the true

parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

is ''p'' = 0.5. In practice, since ''p'' is unknown, the maximum variance is often used for sample size assessments. If a reasonable estimate for p is known the quantity

p(1-p)

may be used in place of 0.25. For sufficiently large ''n'', the distribution of

\hat

will be closely approximated by a

normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...

. Using this and the Wald method for the binomial distribution, yields a confidence interval of the form :

\left (\widehat p - Z\sqrt, \quad \widehat p + Z\sqrt \right )

, :where Z is a standard

Z-score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...

for the desired level of confidence (1.96 for a 95% confidence interval). If we wish to have a confidence interval that is ''W'' units total in width (W/2 on each side of the sample mean), we will solve :

Z\sqrt = W/2

for ''n'', yielding the sample size Sample size proportions

n=\frac

, in the case of using .5 as the most conservative estimate of the proportion. ''(Note: W/2 =

margin of error The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the e ...

.)'' In the figure below one can observe how sample sizes for binomial proportions change given different confidence levels and margins of error. Otherwise, the formula would be

Z\sqrt = W/2

, which yields

n = \frac

. For example, if we are interested in estimating the proportion of the US population who supports a particular presidential candidate, and we want the width of 95% confidence interval to be at most 2 percentage points (0.02), then we would need a sample size of (1.96)²/ (0.02²) = 9604. It is reasonable to use the 0.5 estimate for p in this case because the presidential races are often close to 50/50, and it is also prudent to use a conservative estimate. The

in this case is 1 percentage point (half of 0.02). The foregoing is commonly simplified :

\left (\widehat p -1.96\sqrt, \widehat p +1.96\sqrt \right )

will form a 95% confidence interval for the true proportion. If this interval needs to be no more than ''W'' units wide, the equation :

4\sqrt = W

can be solved for ''n'', yielding ''n'' = 4/''W''² = 1/''B''² where ''B'' is the error bound on the estimate, i.e., the estimate is usually given as ''within ± B''. For ''B'' = 10% one requires ''n'' = 100, for ''B'' = 5% one needs ''n'' = 400, for ''B'' = 3% the requirement approximates to ''n'' = 1000, while for ''B'' = 1% a sample size of ''n'' = 10000 is required. These numbers are quoted often in news reports of

opinion poll An opinion poll, often simply referred to as a survey or a poll (although strictly a poll is an actual election) is a human research survey of public opinion from a particular sample. Opinion polls are usually designed to represent the opinions ...

s and other

sample survey In statistics, quality assurance, and survey methodology, sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population. Statisticians attem ...

s. However, the results reported may not be the exact value as numbers are preferably rounded up. Knowing that the value of the ''n'' is the minimum number of sample points needed to acquire the desired result, the number of respondents then must lie on or above the minimum.

Estimation of a mean

When estimating the population mean using an independent and identically distributed (iid) sample of size ''n'', where each data value has variance ''σ''², the

standard error The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error ...

of the sample mean is: :

\frac.

This expression describes quantitatively how the estimate becomes more precise as the sample size increases. Using the

to justify approximating the sample mean with a normal distribution yields a confidence interval of the form :

\left(\bar x - \frac, \quad \bar x + \frac \right )

, :where Z is a standard

for the desired level of confidence (1.96 for a 95% confidence interval). If we wish to have a confidence interval that is ''W'' units total in width (W/2 being the

on each side of the sample mean), we would solve :

\frac = W/2

for ''n'', yielding the sample size

n = \frac

''.'' For example, if we are interested in estimating the amount by which a drug lowers a subject's blood pressure with a 95% confidence interval that is six units wide, and we know that the standard deviation of blood pressure in the population is 15, then the required sample size is

\frac = 96.04

, which would be rounded up to 97, because the obtained value is the ''minimum'' sample size, and sample sizes must be integers and must lie on or above the calculated minimum.

Required sample sizes for hypothesis tests

A common problem faced by statisticians is calculating the sample size required to yield a certain

for a test, given a predetermined

Type I error In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis (also known as a "false positive" finding or conclusion; example: "an innocent person is convicted"), while a type II error is the fa ...

rate α. As follows, this can be estimated by pre-determined tables for certain values, by Mead's resource equation, or, more generally, by the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Ev ...

Tables

The table shown on the right can be used in a two-sample t-test to estimate the sample sizes of an

experimental group An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into Causality, cause-and-effect by demonstrating what outcome oc ...

and a

control group In the design of experiments, hypotheses are applied to experimental units in a treatment group. In comparative experiments, members of a control group receive a standard treatment, a placebo, or no treatment at all. There may be more than one tr ...

that are of equal size, that is, the total number of individuals in the trial is twice that of the number given, and the desired

significance level In statistical hypothesis testing, a result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). More precisely, a study's defined significance level, denoted by \alpha, is the ...

is 0.05.Chapter 13
page 215, in: The parameters used are: *The desired

of the trial, shown in column to the left. *

Cohen's d In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...

(= effect size), which is the expected difference between the

s of the target values between the experimental group and the

, divided by the expected

standard deviation In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while ...

Mead's resource equation

Mead's resource equation is often used for estimating sample sizes of

laboratory animal Animal testing, also known as animal experimentation, animal research, and ''in vivo'' testing, is the use of non-human animals in experiments that seek to control the variables that affect the behavior or biological system under study. This ...

s, as well as in many other laboratory experiments. It may not be as accurate as using other methods in estimating sample size, but gives a hint of what is the appropriate sample size where parameters such as expected standard deviations or expected differences in values between groups are unknown or very hard to estimate.online Page 29
/ref> All the parameters in the equation are in fact the degrees of freedom of the number of their concepts, and hence, their numbers are subtracted by 1 before insertion into the equation. The equation is: :

E = N - B - T,

where: *''N'' is the total number of individuals or units in the study (minus 1) *''B'' is the ''blocking component'', representing environmental effects allowed for in the design (minus 1) *''T'' is the ''treatment component'', corresponding to the number of treatment groups (including

) being used, or the number of questions being asked (minus 1) *''E'' is the degrees of freedom of the ''error component and'' should be somewhere between 10 and 20. For example, if a study using laboratory animals is planned with four treatment groups (''T''=3), with eight animals per group, making 32 animals total (''N''=31), without any further

stratification Stratification may refer to: Mathematics * Stratification (mathematics), any consistent assignment of numbers to predicate symbols * Data stratification in statistics Earth sciences * Stable and unstable stratification * Stratification, or st ...

(''B''=0), then ''E'' would equal 28, which is above the cutoff of 20, indicating that sample size may be a bit too large, and six animals per group might be more appropriate.

Cumulative distribution function

Let ''X_i'', ''i'' = 1, 2, ..., ''n'' be independent observations taken from a

with unknown mean μ and known variance σ². Consider two hypotheses, a

null hypothesis In scientific research, the null hypothesis (often denoted ''H''0) is the claim that no difference or relationship exists between two sets of data or variables being analyzed. The null hypothesis is that any experimentally observed difference is d ...

: :

H_0:\mu=0

and an alternative hypothesis: :

H_a:\mu=\mu^*

for some 'smallest significant difference' ''μ''^* > 0. This is the smallest value for which we care about observing a difference. Now, if we wish to (1) reject ''H''₀ with a probability of at least 1 − ''β'' when ''H''_a is true (i.e. a

of 1 − ''β''), and (2) reject ''H''₀ with probability α when ''H''₀ is true, then we need the following: If ''z''_''α'' is the upper α percentage point of the standard normal distribution, then :

\Pr(\bar x >z_\alpha \sigma/\sqrt\mid H_0)=\alpha

and so : 'Reject ''H''₀ if our sample average (

\bar x

) is more than

z_\sigma/\sqrt

' is a

decision rule In decision theory, a decision rule is a function which maps an observation to an appropriate action. Decision rules play an important role in the theory of statistics and economics, and are closely related to the concept of a strategy (game theory ...

which satisfies (2). (This is a 1-tailed test.) Now we wish for this to happen with a probability at least 1 − ''β'' when ''H''_a is true. In this case, our sample average will come from Normal distribution with mean μ^*. Therefore, we require :

\Pr(\bar x >z_\alpha \sigma/\sqrt\mid H_a)\geq 1-\beta

Through careful manipulation, this can be shown (see Statistical power Example) to happen when :

n \geq \left(\frac\right)^2

where

\Phi

is the normal

Stratified sample size

With more complicated sampling techniques, such as

stratified sampling In statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each s ...

, the sample can often be split up into sub-samples. Typically, if there are ''H'' such sub-samples (from ''H'' different strata) then each of them will have a sample size ''n_h'', ''h'' = 1, 2, ..., ''H''. These ''n_h'' must conform to the rule that ''n''₁ + ''n''₂ + ... + ''n''_''H'' = ''n'' (i.e., that the total sample size is given by the sum of the sub-sample sizes). Selecting these ''n_h'' optimally can be done in various ways, using (for example) Neyman's optimal allocation. There are many reasons to use stratified sampling: to decrease variances of sample estimates, to use partly non-random methods, or to study strata individually. A useful, partly non-random method would be to sample individuals where easily accessible, but, where not, sample clusters to save travel costs. In general, for ''H'' strata, a weighted sample mean is :

\bar x_w  = \sum_^H W_h \bar x_h,

with :

\operatorname(\bar x_w) = \sum_^H W_h^2 \operatorname(\bar x_h).

The weights,

W_h

, frequently, but not always, represent the proportions of the population elements in the strata, and

W_h=N_h/N

. For a fixed sample size, that is

n = \sum n_h

, :

\operatorname(\bar x_w) = \sum_^H W_h^2 \operatorname(\bar x_h) \left(\frac - \frac\right),

which can be made a minimum if the

sampling rate In signal processing, sampling is the reduction of a continuous-time signal to a discrete-time signal. A common example is the conversion of a sound wave to a sequence of "samples". A sample is a value of the signal at a point in time and/or spac ...

within each stratum is made proportional to the standard deviation within each stratum:

n_h/N_h=k S_h

, where

S_h = \sqrt

and

k

is a constant such that

\sum = n

. An "optimum allocation" is reached when the sampling rates within the strata are made directly proportional to the standard deviations within the strata and inversely proportional to the square root of the sampling cost per element within the strata,

C_h

: :

\frac = \frac,

where

K

is a constant such that

\sum = n

, or, more generally, when :

n_h = \frac.

Qualitative research

Sample size determination in qualitative studies takes a different approach. It is generally a subjective judgment, taken as the research proceeds. One approach is to continue to include further participants or material until

saturation Saturation, saturated, unsaturation or unsaturated may refer to: Chemistry * Saturation, a property of organic compounds referring to carbon-carbon bonds **Saturated and unsaturated compounds ** Degree of unsaturation **Saturated fat or fatty aci ...

is reached. The number needed to reach saturation has been investigated empirically. There is a paucity of reliable guidance on estimating sample sizes before starting the research, with a range of suggestions given. A tool akin to a quantitative power calculation, based on the

negative binomial distribution In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-r ...

, has been suggested for

thematic analysis Thematic analysis is one of the most common forms of analysis within qualitative research. It emphasizes identifying, analysing and interpreting patterns of meaning (or "themes") within qualitative data. Thematic analysis is often understood as a ...

.Galvin R (2015). How many interviews are enough? Do qualitative interviews in building energy consumption research produce reliable knowledge? Journal of Building Engineering, 1:2–12.

References

General references

* * * * *Rens van de Schoot, Milica Miočević (eds.). 2020. Small Sample Size Solutions (Open Access): A Guide for Applied Researchers and Practitioners. Routledge.

External links

A MATLAB script implementing Cochran's sample size formula
{{DEFAULTSORT:Sample Size Sampling (statistics) de:Zufallsstichprobe#Stichprobenumfang